Optimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005
|
|
- Curtis Warren
- 5 years ago
- Views:
Transcription
1 IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers using ROC Analysis Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / pie@zurich.ibm.com ICML 2005 August 9, 2005
2 To classify, or not to classify: that is the question. 2 August 9, 2005
3 Motivation! Abstaining classifiers are classifiers that in certain cases can refrain from classification and are similar to human experts who can say I don t know.! In many domains such experts are preferred to the ones that always make a decision and are sometimes wrong (think doctor ).! Machine learning has frequently used abstaining classifiers ([FH04], [GL00], [PMAS94], [Tort00]) also implicitly (e.g., active learning, delegating classifiers, triskels (ICML05)).! Q1: How do we optimally select abstaining classifiers?! Q2: How do we compare normal and abstaining classifiers? 3 August 9, 2005
4 Outline 2. Tri-State Classifier 1. Model 2. Model 3. Model 4 August 9, 2005
5 Notation! Binary classifier C is a function : i α {,-}, where i I is an instance! Ranker R (a.k.a scoring classifier) is a function attaching rank to an instance i α R, can be converted to a binary classifier C τ using i : C τ (i) = R(i) τ! Abstaining binary classifier A is a classifier that in certain case can refrain from classification. We denote it as attaching a third class?. 5 August 9, 2005
6 ROC Background! Evaluate model performance under all class and cost distributions 2D plot (X false positive rate, Y true positive rate) Classifier C corresponds to a single point on the ROC curve (fp, tp).! Classifier C τ (or a machine learning method L τ ) has a parameter τ, varying which produces multiple points.! Therefore we consider a ROC curve a function f : τ α(fp τ, tp τ ).! Can find an inverse function f -1 : (fp τ, tp τ ) α τ 6 August 9, 2005
7 ROC Background! ROC Convex Hull A piecewise-linear convex down curve f R, having the following properties: f R (0) = 0, f R (1) = 1 Slope of f R is monotonically non-increasing. Assume that for any value m, there [PF98] exists f R (x) = m. Vertices have ``slopes assuming values between the slopes of adjacent edges Assume sentinel edges: 0 th edge with a slope and (n1) th edge with a slope 0. We will use ROCCH instead of ROC. 7 August 9, 2005
8 Some Definitions! Confusion Matrix tp TP = fp TP FN FN fn = TP FN = FP FP TN A/C - TP FP - FN TN P N! Cost Matrix A/C - CR = c c c 21 c 12 0 A = Actual, C = Classified as 8 August 9, 2005
9 Cost Minimizing Criteria for One Classifier! Known iso-performance lines [PF98] ( fp) f ROC = CR N P 9 August 9, 2005
10 Outline 2. Tri-State Classifier 1. Model 2. Model 3. Model 10 August 9, 2005
11 Metaclassifier A α,β! IDEA: Construct the classifier as follows: A α, β ( x) =? C ( x) = where C α, C β is such that: x : ( C α ( C ( C ( x) = ) ( C ( x) = ) C α β α ( x) = ( x) = C β β ( x) = C β ( x) = ) α ( x) = ) C α - - C β - - Result? Impossible -! Can we optimally select C α, C β? 11 August 9, 2005
12 Requirements on the ROC Curve Requirement: for a ROC curve and any two classifiers C α and C β corresponding to points (fp α, tp α ) and (fp β, tp β ) such that fp α fp β x : ( C α ( x) = C ( C β β ( x) = C ( x) = ) α ( x) = )! Conditions are the same used by [FlachWu03] and are met in particular if classifiers C α and C β are constructed from a single ranker R. 12 August 9, 2005
13 Optimal Metaclassifier A α,β! How do we compare binary classifiers and abstaining classifiers? How to select an optimal classifier?! No clear answer Use cost based model ( Model) Use boundary conditions: Maximum number of instances classified as? (Bounded- Abstention Model) Maximum misclassification cost ( Model) 13 August 9, 2005
14 Model! Cost Matrix A/C - 0 c 21 - c 12 0? c 13 c 23 C α A/C - TP α FP α - FN α TN α C β! Important properties ( fp )( ) α fpβ fpβ fpα ( fn fn )( fn fn ) β α β α A/C - TP β FP β - FN β TN β A = Actual, C = Classified as 14 August 9, 2005
15 Selecting the Optimal Classifier! Similar criteria minimize the cost 1 rc = FN c N P β fnβ, fnα rc rc = 0 = 0 FP FP f f ROC ROC β ( ( fp fp β α ) ) = = c c α c23 c c c FPα c N P N P 21 fpα, fpβ ( FP FP ) c ( FN FN ) 1 β α disagree misclass. β 1 β c α disagree misclass. α 15 August 9, 2005
16 Model a Simulated Example ROC curve with two optimal classifiers Misclassification cost for different combinations of A and B TP Classifier A f f ROC ROC ( ( Classifier B fp fp β α ) ) = = c c c23 c c c N P N P Cost FP(a) FP(b) FP 16 August 9, 2005
17 Understanding Cost Matrices! 2x2 cost matrix is well known. 2x3 cost matrices has some interesting properties: e.g., under which conditions the optimal classifier is an abstaining classifier.! Our derivation is valid for ( c c ) ( c > c ) ( c c c c c ) c12 we can prove that if this condition is not met the classifier is a trivial binary classifier 17 August 9, 2005
18 Cost Matrices Interesting Cases! How to set c 13, c 23 so that the classifier is a nontrivial abstaining classifier?! Two interesting cases Symmetric case (c 13 =c 23 ) c 13 = c 23 c c c c Proportional case (c 13 / c 23 = c 12 / c 21 ) c 13 c12 2 c 23 c August 9, 2005
19 Bounded Models! Problem: 2x3 cost matrix is not always given and would have to be estimated. However, classifier is very sensitive to c 13, c 23.! Finding other optimization criteria for an abstaining classifier using a standard cost matrix. Calculate misclassification costs per classified instance! Follow the same reasoning to find the optimal classifier 19 August 9, 2005
20 Bounded Models Equation! Obtained the following equation, determining the relationship between k and rc for as a function of classifiers C α, C β. rc k = = 1 ( )( ) ( FP ) αc21 FN βc12 1 k N P 1 (( fp fp ) ( fn fn ) N P β Constrain k, minimize rc bounded-abstention Constrain rc, minimize k bounded-improvement! No algebraic solution, need to optimize numerically. α α β 20 August 9, 2005
21 Model! Among classifiers abstaining for no more than a fraction of k MAX instances find the one that minimizes rc.! Useful application in real-time processing instances where the non-classified instances will be processed by another classifier with a limited processing speed.! Can prove that the solution is not limited to vertices of ROCCH. 21 August 9, 2005
22 Model a Simulated Example ROC curve with two optimal classifiers Misclassification cost (tp fp). Bounded case? <= 0.2 TP Classifier B Classifier A Cost FP(a) FP(b) FP 22 August 9, 2005
23 Model! Among classifiers having misclassification cost not higher than rc MAX, find the one that abstains for the smallest number of instances.! Useful in, e.g. medical domain where having a test want to achieve a certain lower misclassification cost allowing for non-classified instances.! For the evaluation use f, such that rc MAX = (1-f)rc, where rc is the cost of the optimal binary classifier.! Can prove that the solution is not limited to vertices of ROCCH. 23 August 9, 2005
24 Moded a Simulated Example ROC curve with two optimal classifiers Fraction of skipped instances for different combinations of A and B TP Classifier Classifier A B Skipped Fraction FP(b) FP(a) FP 24 August 9, 2005
25 Experiments! Tested with 15 UCI KDD datasets, using averaged cross-validation.! In each model used one independent parameter c 13 =c 23, k or f.! Classifier Bayesian classifier from Weka [WF00].! Numerical calculations and optimization in R.! Showing results for one representative dataset. 25 August 9, 2005
26 Building an Abstaining Classifier training instances (1) 2x3 cost matrix (2) 2x2 cost matrix, fraction k or f n-fold Crossvalidation training set testing set Build Classifier Classify Collect Statistics Find Thresholds* ROC Build ROC thresholds Construct Tri-State Classifier Abstaining classifier (for each fold) repeat m-times and average Binary classifier Build Classifier 26 August 9, 2005
27 Results Model ionosphere.arff ionosphere.arff ionosphere.arff cost improvement fraction instances skipped cost improvement cost value c13=c cost value c13=c fraction instances skipped 27 August 9, 2005
28 Results Model ionosphere.arff ionosphere.arff relative cost improvement misclassification cost (rc) fraction skipped (k) fraction skipped (k) 28 August 9, 2005
29 Results Model ionosphere.arff ionosphere.arff fraction skipped (k) fraction skipped (k) relative cost improvement (f) misclassification cost (rc) 29 August 9, 2005
30 Summary! Abstaining classifier as a metaclassifier Cost-based model Bounded-improvement model Bounded-abstention model! Methodically tested and proved it works (in all three models) Multiple data sets (UCI KDD) Cross-validation! Idea fits our alert classification system (see: Pietraszek 2004, Using Adaptive Alert Classification to Reduce False Positives in Intrusion Detection ) 30 August 9, 2005
31 IBM Zurich Research Laboratory, GSAL END
32 Bibliography (1)! [Chow70] Chow, C. (1970). On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory, 16, ! [Dietterich98] Dietterich, T. G. (1998). Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, 10, ! [Fawcett03] Fawcett, T. (2003). ROC graphs: Note and practical considerations for researchers (HPL ) (Technical Report). HP Laboratories.! [FFH04] Ferri, C., Flach, P., Hernandez-Orallo, J. (2004). Delegating classifiers. Proceedings of 21th International Conference on Machine Leaning (ICML'04) (pp ). Alberta, Canada: Omnipress.! [FerriHernandez04] Ferri, C., Hernandez-Orallo, J. (2004). Cautious classifiers. Proceedings of ROC Analysis in Artificial Intelligence, 1st International Workshop (ROCAI-2004) (pp ). Valencia Spain.! [FlachWu03] Flach, P.A., Wu, S. (2003). Repairing concavities in ROC curves. Proc UK Workshop on Computational Intelligence (pp ). Bristol, UK.! [GambergerLavrac00] Gamberger, D., Lavrac, N. (2000). Reducing misclassification costs. Principles of Data Mining and Knowledge Discovery, 4th European Conference (PKDD 2000) (pp ). Lyon, France: Springer Verlag.! [HettichBay99] Hettich, S., Bay, S. D. (1999). The UCI KDD Archive. Web page at [LewisCatlett94] Lewis, D.D., Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. Proceedings of ICML-94, 11th International Conference on Machine Learning (pp ). Morgan Kaufmann Publishers, San Francisco, US. 32 August 9, 2005
33 Bibliography (2)! [NedlerMead65] Nedler, J., Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7, ! [PMAS94] Pazzani, M.J., Murphy, P., Ali, K., Schulenburg, D. (1994). Trading off coverage for accuracy in forecasts: Applications to clinical data analysis. Proceedings of AAAI Symposium on AI in Medicine (pp ). Stanford, CA.! [ProvostFawcett98]Provost, F., Fawcett, T. (1998). Robust classification systems for imprecise environemnts. Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98) (pp ). AAAI Press.! [Tortorella00] Tortorella, F. (2000). An optimal reject rule for binary classifiers. Advances in Pattern Recognition, Joint IAPR International Workshops SSPR 2000 and SPR 2000 (pp.\/ ). Alicante, Spain: Springer-Verlag.! [WittenFrank00] Witten, I.H., Frank, E. (2000). Data Mining: Practical machine learning tools with Java implementations. San Francisco: Morgan Kaufmann. 33 August 9, 2005
34 Further Improvements in: and Models! In previous work, we used general numerical methods to find the solution! But: ROCCH is not an arbitrary function, but has special properties Thus, we can do much better and understand the tri-state classifiers better.! Proposed an algorithm and a proof (see paper). 34 August 9, 2005
35 Optimal Classifier Path Optimal classifier path bounded abstention Cost FP(a) FP(b) August 9, 2005
36 Cost Algorithm Bounded Abstention Model Smallest relative gradient path bounded abstention FP(b) FP(a) August 9, 2005
37 k Algorithm Model Optimal classifier path bounded improvement FP(b) FP(a) 37 August 9, 2005
38 Selecting the Optimal Classifier! Criteria minimize the misclassification cost rc rc rc = = = d rc d FP 1 N P 1 N P 1 N P = 1 N P ( FP c FN c ) ( FP c P(1 TP) c ) FP c c P 1 P N c f f ROC ROC 12 FP N FP N tp c = TP = P f = f 12 ROC 0 ( fp) ROC FP N 38 August 9, 2005
39 Cost Matrices! Theorem. If (*) is not met, the classifier is a trivial binary classifier. ( c c ) ( c > c ) ( c c c c c ) (*) c12! Proof (sketch) show that for an optimal classifier f R (fp * α ) f R (fp* ) f R (fp * β ), where fp* corresponds to an optimal binary classifier. show that if (*) is not met, fp is positive for fp * α < fp* α rc and that is positive for fp * β > fp* fp β therefore fp * α = fp* = fp * β rc 39 August 9, 2005
An Analysis of Reliable Classifiers through ROC Isometrics
An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit
More informationCautious Classifiers. Cèsar Ferri 1 and José Hernández-Orallo 1 1 INTRODUCTION
Cautious Classifiers Cèsar Ferri and José Hernández-Orallo Abstract. The evaluation and use of classifiers is based on the idea that a classifier is defined as a complete function from instances to classes.
More informationPointwise Exact Bootstrap Distributions of Cost Curves
Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline
More informationData Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation
Data Privacy in Biomedicine Lecture 11b: Performance Measures for System Evaluation Bradley Malin, PhD (b.malin@vanderbilt.edu) Professor of Biomedical Informatics, Biostatistics, & Computer Science Vanderbilt
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationAn Empirical Study of Building Compact Ensembles
An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationFormulation and comparison of multi-class ROC surfaces
Formulation and comparison of multi-class ROC surfaces Jonathan E. Fieldsend J.E.Fieldsend@exeter.ac.uk Richard M. Everson R.M.Everson@exeter.ac.uk Department of Computer Science, University of Exeter,
More informationIndex of Balanced Accuracy: A Performance Measure for Skewed Class Distributions
Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions V. García 1,2, R.A. Mollineda 2, and J.S. Sánchez 2 1 Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca Av.
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationPerformance evaluation of binary classifiers
Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in
More informationPrecision-Recall-Gain Curves: PR Analysis Done Right
Precision-Recall-Gain Curves: PR Analysis Done Right Peter A. Flach Intelligent Systems Laboratory University of Bristol, United Kingdom Peter.Flach@bristol.ac.uk Meelis Kull Intelligent Systems Laboratory
More informationAn asymmetric entropy measure for decision trees
An asymmetric entropy measure for decision trees Simon Marcellin Laboratoire ERIC Université Lumière Lyon 2 5 av. Pierre Mendès-France 69676 BRON Cedex France simon.marcellin@univ-lyon2.fr Djamel A. Zighed
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationCptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1
CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner
More informationEvaluation Metrics for Intrusion Detection Systems - A Study
Evaluation Metrics for Intrusion Detection Systems - A Study Gulshan Kumar Assistant Professor, Shaheed Bhagat Singh State Technical Campus, Ferozepur (Punjab)-India 152004 Email: gulshanahuja@gmail.com
More informationA Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance
A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance Peter Flach PETER.FLACH@BRISTOL.AC.UK Intelligent Systems Laboratory, University of Bristol, UK José Hernández-Orallo
More informationIntroduction to Supervised Learning. Performance Evaluation
Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation
More informationAUC Maximizing Support Vector Learning
Maximizing Support Vector Learning Ulf Brefeld brefeld@informatik.hu-berlin.de Tobias Scheffer scheffer@informatik.hu-berlin.de Humboldt-Universität zu Berlin, Department of Computer Science, Unter den
More informationOn Multi-Class Cost-Sensitive Learning
On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou and Xu-Ying Liu National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract
More information.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting
More informationInternational Journal "Information Theories & Applications" Vol.14 /
International Journal "Information Theories & Applications" Vol.4 / 2007 87 or 2) Nˆ t N. That criterion and parameters F, M, N assign method of constructing sample decision function. In order to estimate
More informationClassifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier
Classifier Learning Curve How to estimate classifier performance. Learning curves Feature curves Rejects and ROC curves True classification error ε Bayes error ε* Sub-optimal classifier Bayes consistent
More informationA Simple Implementation of the Stochastic Discrimination for Pattern Recognition
A Simple Implementation of the Stochastic Discrimination for Pattern Recognition Dechang Chen 1 and Xiuzhen Cheng 2 1 University of Wisconsin Green Bay, Green Bay, WI 54311, USA chend@uwgb.edu 2 University
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationCC283 Intelligent Problem Solving 28/10/2013
Machine Learning What is the research agenda? How to measure success? How to learn? Machine Learning Overview Unsupervised Learning Supervised Learning Training Testing Unseen data Data Observed x 1 x
More informationReducing False Alarm Rate in Anomaly Detection with Layered Filtering
Reducing False Alarm Rate in Anomaly Detection with Layered Filtering Rafa l Pokrywka 1,2 1 Institute of Computer Science AGH University of Science and Technology al. Mickiewicza 30, 30-059 Kraków, Poland
More informationSmart Home Health Analytics Information Systems University of Maryland Baltimore County
Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on
More informationOn optimal reject rules and ROC curves
On optimal reject rules and ROC curves Carla M. Santos-Pereira a and Ana M. Pires b a Universidade Portucalense Infante D. Henrique, Oporto, Portugal and Centre for Mathematics and its Applications (CEMAT),
More informationRough Set Model Selection for Practical Decision Making
Rough Set Model Selection for Practical Decision Making Joseph P. Herbert JingTao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 {herbertj, jtyao}@cs.uregina.ca
More informationDynamic Linear Combination of Two-Class Classifiers
Dynamic Linear Combination of Two-Class Classifiers Carlo Lobrano 1, Roberto Tronci 1,2, Giorgio Giacinto 1, and Fabio Roli 1 1 DIEE Dept. of Electrical and Electronic Engineering, University of Cagliari,
More informationE. Alpaydın AERFAISS
E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy
More informationHypothesis Evaluation
Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall 1395 1 / 31 Table of contents 1 Introduction
More informationSupport Vector Machine Classification via Parameterless Robust Linear Programming
Support Vector Machine Classification via Parameterless Robust Linear Programming O. L. Mangasarian Abstract We show that the problem of minimizing the sum of arbitrary-norm real distances to misclassified
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationLecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationInvestigating Measures of Association by Graphs and Tables of Critical Frequencies
Investigating Measures of Association by Graphs Investigating and Tables Measures of Critical of Association Frequencies by Graphs and Tables of Critical Frequencies Martin Ralbovský, Jan Rauch University
More informationCSC314 / CSC763 Introduction to Machine Learning
CSC314 / CSC763 Introduction to Machine Learning COMSATS Institute of Information Technology Dr. Adeel Nawab More on Evaluating Hypotheses/Learning Algorithms Lecture Outline: Review of Confidence Intervals
More informationBANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1
BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I
More informationOn Multi-Class Cost-Sensitive Learning
On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou, Xu-Ying Liu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationSupport Vector Machines with a Reject Option
Support Vector Machines with a Reject Option Yves Grandvalet, 2, Alain Rakotomamonjy 3, Joseph Keshet 2 and Stéphane Canu 3 Heudiasyc, UMR CNRS 6599 2 Idiap Research Institute Université de Technologie
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationA Comparison of Different ROC Measures for Ordinal Regression
A Comparison of Different ROC Measures for Ordinal Regression Willem Waegeman Willem.Waegeman@UGent.be Department of Electrical Energy, Systems and Automation, Ghent University, Technologiepark 913, B-905
More informationhsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference
CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science
More informationMoving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1
Machine Learning Overview Supervised Learning Training esting Te Unseen data Data Observed x 1 x 2... x n 1.6 7.1... 2.7 1.4 6.8... 3.1 2.1 5.4... 2.8... Machine Learning Patterns y = f(x) Target y Buy
More informationP leiades: Subspace Clustering and Evaluation
P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de
More informationDirectly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers
Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers Hiva Ghanbari Joint work with Prof. Katya Scheinberg Industrial and Systems Engineering Department US & Mexico Workshop
More informationCSC 411: Lecture 03: Linear Classification
CSC 411: Lecture 03: Linear Classification Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 03-Classification 1 / 24 Examples of Problems What
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationInterpreting Low and High Order Rules: A Granular Computing Approach
Interpreting Low and High Order Rules: A Granular Computing Approach Yiyu Yao, Bing Zhou and Yaohua Chen Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail:
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2013/2014 Lesson 18 23 April 2014 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationLearning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht
Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh Outline Introduction Learning with
More informationConfidence Intervals for the Area under the ROC Curve
Confidence Intervals for the Area under the ROC Curve Corinna Cortes Google Research 1440 Broadway New York, NY 10018 corinna@google.com Mehryar Mohri Courant Institute, NYU 719 Broadway New York, NY 10003
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationClassifier performance evaluation
Classifier performance evaluation Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánu 1580/3, Czech Republic
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationPerformance Evaluation and Hypothesis Testing
Performance Evaluation and Hypothesis Testing 1 Motivation Evaluating the performance of learning systems is important because: Learning systems are usually designed to predict the class of future unlabeled
More informationPredicting Partial Orders: Ranking with Abstention
Predicting Partial Orders: Ranking with Abstention Weiwei Cheng 1, Michaël Rademaker 2, Bernard De Baets 2, and Eyke Hüllermeier 1 1 Department of Mathematics and Computer Science University of Marburg,
More informationSubgroup Discovery with CN2-SD
Journal of Machine Learning Research 5 (2004) 153-188 Submitted 12/02; Published 2/04 Subgroup Discovery with CN2-SD Nada Lavrač Jožef Stefan Institute Jamova 39 1000 Ljubljana, Slovenia and Nova Gorica
More informationOnline Estimation of Discrete Densities using Classifier Chains
Online Estimation of Discrete Densities using Classifier Chains Michael Geilke 1 and Eibe Frank 2 and Stefan Kramer 1 1 Johannes Gutenberg-Universtität Mainz, Germany {geilke,kramer}@informatik.uni-mainz.de
More informationON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING
ON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING Mykola PECHENIZKIY*, Alexey TSYMBAL**, Seppo PUURONEN* Abstract. The curse of dimensionality is pertinent to
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationReview of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations
Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,
More informationDepartment of Computer Science, University of Waikato, Hamilton, New Zealand. 2
Predicting Apple Bruising using Machine Learning G.Holmes 1, S.J.Cunningham 1, B.T. Dela Rue 2 and A.F. Bollen 2 1 Department of Computer Science, University of Waikato, Hamilton, New Zealand. 2 Lincoln
More informationExpert Systems with Applications
Expert Systems with Applications 36 (29) 5718 5727 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Cluster-based under-sampling
More informationAnomaly Detection. Jing Gao. SUNY Buffalo
Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their
More informationBackground literature. Data Mining. Data mining: what is it?
Background literature Data Mining Lecturer: Peter Lucas Assessment: Written exam at the end of part II Practical assessment Compulsory study material: Transparencies Handouts (mostly on the Web) Course
More informationVariations of Logistic Regression with Stochastic Gradient Descent
Variations of Logistic Regression with Stochastic Gradient Descent Panqu Wang(pawang@ucsd.edu) Phuc Xuan Nguyen(pxn002@ucsd.edu) January 26, 2012 Abstract In this paper, we extend the traditional logistic
More informationModeling High-Dimensional Discrete Data with Multi-Layer Neural Networks
Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,
More informationUnsupervised Classification via Convex Absolute Value Inequalities
Unsupervised Classification via Convex Absolute Value Inequalities Olvi L. Mangasarian Abstract We consider the problem of classifying completely unlabeled data by using convex inequalities that contain
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationCost-based classifier evaluation for imbalanced problems
Cost-based classifier evaluation for imbalanced problems Thomas Landgrebe, Pavel Paclík, David M.J. Tax, Serguei Verzakov, and Robert P.W. Duin Elect. Eng., Maths and Comp. Sc., Delft University of Technology,
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationSelection of Classifiers based on Multiple Classifier Behaviour
Selection of Classifiers based on Multiple Classifier Behaviour Giorgio Giacinto, Fabio Roli, and Giorgio Fumera Dept. of Electrical and Electronic Eng. - University of Cagliari Piazza d Armi, 09123 Cagliari,
More informationQ1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)
Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7
More informationPattern-Based Decision Tree Construction
Pattern-Based Decision Tree Construction Dominique Gay, Nazha Selmaoui ERIM - University of New Caledonia BP R4 F-98851 Nouméa cedex, France {dominique.gay, nazha.selmaoui}@univ-nc.nc Jean-François Boulicaut
More informationPredicting Partial Orders: Ranking with Abstention
Predicting Partial Orders: Ranking with Abstention Weiwei Cheng 1,Michaël Rademaker 2, Bernard De Baets 2,andEykeHüllermeier 1 1 Department of Mathematics and Computer Science University of Marburg, Germany
More informationMulticlass Multilabel Classification with More Classes than Examples
Multiclass Multilabel Classification with More Classes than Examples Ohad Shamir Weizmann Institute of Science Joint work with Ofer Dekel, MSR NIPS 2015 Extreme Classification Workshop Extreme Multiclass
More informationPerformance Evaluation
Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive
More informationEnsembles of classifiers based on approximate reducts
Fundamenta Informaticae 34 (2014) 1 10 1 IOS Press Ensembles of classifiers based on approximate reducts Jakub Wróblewski Polish-Japanese Institute of Information Technology and Institute of Mathematics,
More informationComparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees
Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland
More informationVC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms
03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of
More informationProbabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning
Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationDevelopment of a Data Mining Methodology using Robust Design
Development of a Data Mining Methodology using Robust Design Sangmun Shin, Myeonggil Choi, Youngsun Choi, Guo Yi Department of System Management Engineering, Inje University Gimhae, Kyung-Nam 61-749 South
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationA Posteriori Corrections to Classification Methods.
A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced
More information